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Abstract. 



We present a Bayesian surrogate model for the analysis of periodic or quasi- 
periodic time series data. We describe a computationally efficient implementation 
nj ' that enables Bayesian model comparison. We apply this model to simulated and 

real exoplanet observations. We discuss the results and demonstrate some of the 
challenges for applying our surrogate model to realistic exoplanet data sets. In 
particular, we find that analyses of real world data should pay careful attention to 
the effects of uneven spacing of observations and the choice of prior for the "jitter" 



C/3 . parameter. 
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Tt ■ 1 Motivation 8i Overview of Exoplanet Observations 



Since the 1990's, nearly 500 extrasolar planets (or exoplanets) have been discovered 
around other stars in our galaxy, yet only a few of which have been observed directly. 
In all other cases, the planet's presence has been inferred from its influence on the light 
of its host star. 



^ ■ 1.1 Doppler Observations 



The most productive such method to date has been observing the Doppler shift of the 
star light due to the gravitational perturbations of the planet. For a single planet 
on a circular orbit, the Doppler signature arises from a sinusoidal variation in the 
star's velocity along the line of sight. More generally, a single planet causes a periodic 
variation in the stellar velocity {v{t)) that follows the shape predicted by Kepler's laws 
of planetary motio n, which can usually be w ell-approximated by the first few terms of a 



Fourier expansion (jKonacki and Macieiewski 1999). If the amplitude of the fundamental 



term (with frequency / = Stt/P, where P is the orbital period) in the Fourier expansion 
of v{t) is Kq, then the coefficients of harmonic terms (with frequency nf) are of order 
e'^Ko, where e is the orbital eccentricity. Given the typical signal-to-noise of detections 
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2 Bayesian Surrogate Model for Time Series Analysis 

and the typical exoplanet eccentricity of less than 0.3 (and often less than 0.1), the use 
of as few as two terms in this Fourier expansion is often accurate to within observational 
uncertainties. If a star has multiple planets, its Doppler signature can be much more 
complex. In some cases, the observed stellar velocities (vk) can be well approximated 
as the linear superposition of multiple planets on Keplerian orbits. In other cases, 
the planet-planet interactions cause effects comparable to the overall amplitude of the 
signal and thus must be considered. In any case, the observational signature is quasi- 
periodic. A common question is whether a Np planet model is sufficient to describe the 
available observations or whether the data demand at least Np + 1 planets. While an 
exploration of the full physical parameter space would be computationally prohibitive, 
a lower dimension al surrogate model can be quite useful for analyzing such a system 



( Veras et al.ll2011[ ) 



1.2 Transit Timing Variations 

Recently, an extremely promising new method of detecting exoplanets has burst on to 
the scene. As a planet passes in front of its host star (i.e., a transit), the star's brightness 
appears to decrease. In an idealized two-body system, the mid-times of the transits 
(i/c) are strictly periodic at the orbital period (P), so the fcth transit occurs at time 
tk = to+kx P. If t here are additional pl anets, then th e times of the transits deviate from 
a linear ephemeris ( Agol et al. 2005; Ho lman and Mur rav 2005; Fo rd and Holmanll2007l ). 



The perturbations of an additional planet can cause deviations of the transit times that 
are simply sinusoidal in time, a periodic non-sinusoidal p attern or a very complex quasi- 
periodic waveform (Figure [H [2l Ford and Holman 2007; Nesvornv and Morbidellill2008 : 



Nesvornvll2009t iNesvorny and Beaugell2010[ ). While it is impractical to explore all the 



parameters of a full physical model, a lower dimensional surrogate model may be able 
to help identify regions of parameter space that merit further investigation with a full 
physical model. 

1.3 Relation to Previous Research 

In this manuscript, we present a new method for analyzing periodic time series data 
using a computationally efficient Bayesian surrogate model. The details of our model 
are chosen to facilitate the analysis of exoplanet observations. We test our model by 
analyzing Doppler and transit timing data sets. Thanks to the computational efficiency 
of our algorithm, it is possible to apply it to a large library of simulated data sets to 
understand how the model performs for different types of planetary systems. 

Astronomers routinely apply Markov chain Monte Carlo (MCMC) techniques to 
perform Bayesian para meter estimation when an alyzing Doppler observations of an ex- 
oplanet host star (e.g.. lFordll2005t lGregorvll2005l ). For multiple planet systems, MCMC 



methods are computationally intensive, even when the model evaluation is performed 
ignoring the gravitational interaction between the planets. While this is a good ap- 
proximation for analyzing the Doppler observations of many systems, mutual planetary 
interactions can be quite significant for planetary systems near a mean motion resonance 
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(e.g., GJ 876: iLaughlin et al.ll2005 ). Full n-body integrations to account for these mu- 



tual planetary interactions are much more computationally demanding. One approach 
is to develop and paralleli ze computationally effi cient methods that allow one to use full 
n-body integrations (e.g.. Ijohnson et al.l 120111 ). While the GJ 876 data set has many 



high signal-to-noise observations, most exoplanet host stars have fewer observations 
and lower signal-to-noise, resulting in weak constraints on many physical model param- 
eters and making it even more difficult to sample these parameter spaces using MCMC. 
The use of a lower-dimensional surrogate model has the potential to contribute to the 
analysis of such systems by identifying the periodicities that are statistically significant 
without introducing several additional parameters that are poorly constrained. Simi- 
lar methods are routinely a pplied in a frequentist context to identify planet candidates 



from Doppler observations ([Gumming et al.l 120081 ). Our Bayesian surrogate model can 



be thought of as a Bayesian generalization of the Lomb-Scargle periodogram ( Gummina 
(2004) that has been further generalized to allow for multiple frequencies, perhaps due 
to t he perturbations from addition al planets or perhaps due to significant eccentric- 



ity (JKonacki and MaciejewskilllQQQI ). Previously, a much more restricted version of the 



surrogate model {Nf^max = 1, Nd.m ax =0, se e §2.2|) was used to evaluate strategies 
for scheduling Doppler observations (|Fordll2008l ) . The generalization in this manuscript 



allows for identifying multiple periodicities, as is necessary for application to eccentric 
and/or multiple planet systems. 

We are optimistic that the surrogate model has even more potential for contributing 
to the analysis of transit timing variation data. In the transit timing variation method, 
the entire signal is due to the mutual gravitational perturbations. Given the highly non- 
linear nature of the problem (particularly near resonances) , a physical model requires 
performing computationally expensive n-body integrations. While it might be practical 
to perform MGMG sampling around one mode of the posterior distribution while using 
full n-body integrations, it is not practical to perform a brute-force glob al search of 



the h igh-dimensional parameter space while using full n-body integrations (|Veras et al. 
120111 ). The evaluation of our surrogate model is orders of magnitude faster than an 
n-body integration. Additionally, the surrogate model is linear in most of its model 
parameters, allowing for efficient identification of the modes and integration over linear 
parameters, once we condition on the non-linear parameters. (We perform integration 
over non-linear parameters via brute force, as described in the supplementary materi- 
als.) The speed of the surrogate model makes it well-suited to exploring a broad range 
of possible orbital configurations. Once the surrogate model identifies significant peri- 
odicities, n-body integrations can be used to perform a more detailed exploration of the 
full physical models in regions which could produce the periodicities identified by the 
surrogate model. We present results of our Bayesian surrogate model applied to simu- 
lated transit timing data and discuss the implications of our results for the prospects of 
transit timing-based planet searches. 
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2 Bayesian Surrogate Model for Analysis of quasi-Periodic 
Time Series 

2.1 Data 

First, we describe our general model for analysis of quasi-periodic time series. In the 
case of Doppler observations, the independent variable, x would correspond to time 
(i) and y would correspond to the star's velocity {v{t)). In the case of transit timing 
observations, x would correspond to the transit number and y would correspond to 
the mid-transit time. Each observation (j/fe) is accompanied by an estimate of the 
measurement uncertainty (ak)- The independent variables (i.e., transit number or time 
of each observation) are assumed to be known precisely. 

2.2 Model 

We explore the use of a surrogate model given by 

y{xk]0) = ^[5'iSin(27r /^a;*;) + C^ cos (27r fiXk)] + ^Dix\, (1) 

where Xk is the independent variable for the kt\\ observation and and y(xk',0) is the 
predicted value of the observable based on the surrogate model with parameters 6. The 
surrogate model parameters 9 include: 1) the number of frequencies in the surrogate 
model (Nf), 2) the frequencies (/i's), 3) the amplitudes for those periodicities (S'i's and 
Ci's), 4) the order of the polynomial terms (Nd), and 5) the polynomial coefficients 
(Di's). This model exactly describes time series which are the superposition of poly- 
nomial and sinusoidal signals. The surrogate model can be used for Bayesian model 
comparison to determine how complex a model (i.e., how many frequencies and/or 
polynomial terms) is justified for a given data set. 

In principle, one could consider alternative basis functions. We favor the use of 
sinusoids since they exactly describe the gravitational perturbation of a planet following 
a circular orbit (and non-interacting with other planets). Further, a sinusoid often 
provides a good first approximation to the perturbation of a planet on an eccentric orbit, 
given typical eccentricities and measurement precision. In practice, we will truncate 
the model to use just a few frequencies and polynomial terms, so as to provide an 
acceptable model for the observations while facilitating the eflficient evaluation of the 
model. If observations span a sufficiently long period of time, then there is no need 
for polynomial terms. In practice, the orbital period of an outer planet (e.g., Saturn 
29.5 years, Neptune 165 years) can be much longer than the time span of observations 
(typically ~1-10 years). If the orbital period is much longer than the time span of 
observations than a simple linear term (constant radial acceleration for Doppler data) 
is all that can be discerned from the available data. In the case of a planet on a circular 
orbit, the "jerk" (i.e., derivative of acceleration) becomes significant as the time span of 
observations approaches a quarter of the orbital period. For the general case of a planet 
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on an eccentric orbit, the jerk can become evident after a greater or lesser duration, 
depending upon the orientation of the orbit and phase of the observations. In either case, 
we cannot infer the physical parameters with just two derivatives measured, as these are 
insufficient to characterize an orbit that requres more parameters to fully specify (seven 
physical parameters in general, and at least three even if we arbitrarily assume a circular, 
coplanar orbit). In these cases, using only one or two polynomial terms dramatically 
increases the computational efficiency, since they eliminate the need to explore a large 
number of frequencies. Another advantage of the polynomial basis is that a simple linear 
term corresponds to the Doppler perturbation from a body sufficiently distant that its 
orbital motion is insignificant over the time span of observations. A quadratic term 
corresponds to physical motion of the perturber over the time span of observations. 

In cases where the time span of observations is sufficient to discern a third polynomial 
term, it becomes more difficult to justify a polynomial model. On one hand, a cubic term 
is still much less computationally demanding than considering an additional harmonic 
term (see §2.5p . On the other hand, according to our physical model, all signals are 
periodic on sufficiently long timescales and three derivatives are sufficient to constrain 
the range of plausible periods and amplitudes. While three derivatives are not sufficient 
to infer the shape of the waveform, in many cases (e.g, a distant planet on nearly 
circular orbit) a simple sinusoid is a reasonable first approximation. Therefore, we favor 
considering a model with an additional harmonic term over including three or more 
polynomial terms. 

2.3 Likelihood 

We take the likelihood of each observation to be 



Lkixk,yk,o-k\0) = N{yk ~y{xk;9),al + (Jj) (2) 



where 7V(/i, cr^) is a normal distribution with mean /i and variance a^. Here 9 — 
{Nf,fi,...fNf,Si,...SNf,Ci,...CNf,Nd,Do,...DNa,<Jj}- Note that we have expanded 
the set of model parameters (9) to include (Tj, a "jitter" parameter that is related the 
amount of scatter that is not accounted for by the observational uncertainties. The 
origin of the jitter need not be specified. It could be due to inaccurate estimates of 
the observational uncertainties or physical effects that are not included in a particular 
model (e.g., star spots, p- modes, additional planets). 

We assume that the observational errors are uncorrelated in time, so the likelihood 
for a data set of Nobs observations is given by 



Nab. 

L{9)^Y[Lkixk,yk,'Tk\9). (3) 

/c=l 
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2.4 Priors 

In principle, one might aim to choose priors that are based on the distribution of masses 
and orbital periods for exoplanets. Of course, characterizing those distributions is one 
of the primary motivations to conduct exoplanet searches. While exoplanet searches 
have detected hundreds of planets, most of these h ave masses comparable to Saturn or 



greater and orbital periods of a few years or less (jCumming et al.ll2008f) . Inevitably, 
astronomers pushing the frontiers of knowledge (e.g., searching for less massive planets) 
will not know the intrinsic distribution of those planets' orbital properties. Thus, we 
choose broad priors based on physical intuition and mathematical principles, as outlined 
in this section. 

For the number of frequencies in the surrogate model, we adopt the following prior: 
piNf) = {l-2a + a^_J/(l - a) for Nf = 0, p{Nf) = a^/ for < Nf < Nf,ma., 
and p{Nf) — for Nf > Nf^max, where a parametrizes our prior belief about the 
likelihood of multiple frequencies in the signal. The maximum number of frequencies to 
be considered {Nf^max ^ 1) is chosen so as to provide enough complexity to model the 
data while keeping the model evaluation practical. The choice of a geometric probability 
for an increasing number of frequencies has the advantage that the prior ratio p{Nf ~ 
n + l)/p{Nf — n) is independent of n (as long as n + 1 < Nj^rnax)- If we associate 
each frequency with one planet and consider planet detection serially (i.e., first look for 
evidence of one planet, next look for evidence of another planet), then the minimum 
Bayes factor to have a significant detection of each successive planet is constant. 

The prior for the frequencies themselves are given by p(log fi) ^ t^(log /min, log /max), 
where t/(a, b) is a uniform distribution between a and b. For our applications, there 
are physical limits on the range of viable frequencies. For example, /max could be set 
by the shortest orbital period in which a planet would be able to survive for the age of 
the star, and /min could be longest orbital period in which a planet would be able to 
remain bound for the age of the star given perturbations from the galactic tidal field and 
passing stars. In most cases, the time span of observations (Tots) will not be sufficient 
to distinguish such long-period signals from a low-order polynomial. Thus, we typically 
set /min ^ '2 /Tabs, 8-3 the Surrogate model is still able to model slow variations and 
this aids in the rapid evaluation of the surrogate model. Our choice of a prior that is 
flat in the logarithm of the frequency is motivated by the maximum entropy prior for 
a scale parameter. We also experimented with a modified Jeffrey's prior. We found no 
significant difference in the results, as long as there is a significant detection. This can 
be understood simply in terms of the characteristic width of the peaks in the likelihood 
are much smaller than the domain of the frequencies. Thus, as long as the prior for 
frequency has significant support across the entire domain, the choice of the prior for 
frequency has minimal impact on the shape of the posterior (in locations with signifi- 
cant posterior probability) unless there is not significant evidence for any periodicities. 
While we do not attempt to justify our choice of prior based on the distribution of 
exoplanet orbital periods, we note that this alternative approach would result in a fairly 
similar choice of prior, at least for frequencies between ^1/(2 days) and ^1/(2000 days) 



(jCumming et al.ll2008l ) 
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Each of the frequencies (/i) has an ampHtude Ai = y/Sf + Cf and a phase (j)i = 
atan2(— S'i, Ci). We choose a uniform prior for each phase, p((/'i) = 1/ (Stt). Physically, 
this corresponds to time invariance; i.e., the other planetary system does not know what 
time we choose to label as t — 0. For Ai, the total amplitude at frequency i, we adopt 
a modified Jeffrey's prior, p{Ai) ~ (1 + Ai/Ao) for < Ai < A,nax, where Amax 
is the maximum plausible amplitude. For application to Doppler observations, Amax 
could be set by the maximum velocity perturbation by a planet (or binary star). For 
application to transit timing variations, A^ax could be set by the orbital period. Note 
that, in principle, A^ax could be a function of fi. The other parameter, Aq, prevents a 
divergence of the prior at small amplitudes. For some applications Ao may be physically 
motivated. For our applications, we choose Ao based on the minimum detectable signal 
based on the available data set, e.g., Ao ^ (l/cr,^) , where (l/crf ) is effectively 

the weighted average measurement precision. In practice, we find that any sufficiently 
small choice of Ao and large choice of ^max gives similar results for parameter estimation 
for a given Nf. The choice of Ao, ^max and a do affect the marginalized posterior 
probability ratio for Nf. 

For our typical applications, the total signal amplitude is proportional to the mass 
of the planet. While we do not attempt to justify our choice of prior based on the 
distribution of exoplanet masses, we note that this alternative approach would result in 
a fairly similar choice of prior, at least for readily detectable planets (e.g., more massive 
than Saturn for Doppler surveys). Present observations are only beginning to provide 
significant constraints on the dist ribution of planet masses for low-mass planets at sma ll 
short orbital periods (<50 days) (|Howard et al.lbOlOtlBorucki et al.ll201ll : lYoudinll201ll ). 



We implement the above by using a "two-dimensional modified Jeffrey's prior" for 
each pair of amplitudes Si and C,; , 



p{s,,a)^i/ 



ATrJSf + Cf log (1 + AnaxMo) 1 + JSf + Cf/Ao 



(4) 



ioiJS, 



a.ndp{Si,Ci) — for y/Sf + Cf > Amax- InspectionofEqn.|3]shows that it is essentially 
a modified Jeffery's prior for the total amplitude Ai. 

For the number of polynomial terms in the surrogate model, we adopt a prior similar 
to that for the number of frequencies: p{Nd) = (1 -2/3-h/3^'*'""'""^)/(l -/3) for Nd = 1, 
p{Nd) = /S^-^-i for 1 < Nd < Nd,ma., and piNd) = for Nd > Nd,max, where /3 
parametrizes our prior belief about the likelihood of higher order polynomial terms being 
present in the signal. For our applications, the maximum polynomial order considered 
{Nd^max > 1) is typically set to 1 and rarely more than 2. The choice of a geometric 
probability for an increasing number of polynomial terms has the advantage that the 
prior ratio p{Nd — n + l)/p{Nd — n) is independent of n (as long as n -I- 1 < Nd^max)- 

For both the jitter and the polynomial coefficients, i.e., each of i? e {Di, aj), we 
adopt a modified Jeffrey's prior, 

p{B)^l/[2\B\\og{l + B^,^/Bo){l + \B\/Bo)], |-B| < B,„ax, (5) 
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where Bq and -Bmax are analogous to Ao and Amax- We adopt the same values as 
for Ao and Amax- For some data sets, it may be wise to choose a smaller Bo, as the 
minimum measurable magnitudes for the I?i's depend on the effective measurement 
precision, the number of observations and the time-span of observations. In many 
cases, the amplitude of long-term trends is proportional to the perturbing planet's 
mass, motivating a prior p{Di) that is concentrated at small signals and that has a 
similar shape to the prior for the amplitudes. We also tried using a uniform prior, 
p{Di) = l/(2Amax) for \Di\ < Amax- In practicc, the likelihood is very sharply peaked 
in Di, so results are not sensitive to the choice of its prior. We have only begun to 
experiment with the surrogate model for Nd = 2 (based on discussion at end of H2.'2\\ . 
and caution that further experimentation may be needed for models with Nd > 2. 

In this manuscript, we present results based on a modified Jeffrey's prior, as the 
jitter is a scale parameter and we have physical reason to limit p{<Jj) at small values 
of (Tj, once it becomes small compared to the measurement precision or astrophysical 
effects that can cause non-gravitational perturbations to the Doppler or transit timing 
signal (e.g., non- uniform star spots). Howe ver, we found that in s ome cases our results 



could be sensitive to the prior for the jitter (jPavne and Fordll2011[ ). For example, either 



imposing an upper cutoff on the prior for gj at ~ lOm/s or using a normal distribution 
for p{<Jj) can significantly increase the posterior probability for Nf — n + \ relative 
to that of Nf = n or narrow the range of allowed amplitudes. Previous studies of 
the empirical distribution of Doppler jitter are based on relatively small samples sizes 



(|WrightJl2Q05l ). so they have little to say about the tails of the distribution. Based on 
our results, we encourage further observations and statistical analyses that could inform 
the choice of prior for aj . 

2.5 Numerical Evaluation of Model and Posterior PDF 

The surrogate model was designed to provide a good approximation to Doppler or tran- 
sit timing observations and is likely to provide a reasonable approximation of many 
other time series. A second important feature of the surrogate model is that it per- 
mits efficient evaluation. Several tricks to perform an efficient brute force integration 
are described in the supplementary material. A key feature of the model is that for 
given values of Nf, Nd, /i's and aj, the model is linear. Thus, the integration over the 
remaining parameters can be performed via linear algebra and the Laplace approxima- 
tion. Marginalizing over aj and the /i's can be performed using standard numerical 
techniques. By evaluating the model conditioned on Nf and Nd and marginalizing 
over the remaining parameters, one can compare the marginal posterior probabilities to 
identify values of Nf and Nd that provide a good model for the observations without 
introducing more model parameters than are justified given the available data. For our 
typical applications, the posterior dominated by small values of Nf and Nd, so higher 
values need not be considered explicitly. Since the surrogate model can be integrated nu- 
merically, it provides a quantitative basis for Bayesian model comparison and/or model 
selection. 
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3 Discussion of Application to Exoplanet Observations 
3.1 Doppler Data 

One common challenge for exoplanet searches is deciding when the available data pro- 
vides sufficient evidence to constitute a planet discovery. This is particularly challenging 
in the case of stars with multiple planets, as the data can always be better modeled 
by adding additional planets. The Fourier decomposition of the Doppler signature of a 
planet is dominated by power at the frequency (1/P) corresponding to the orbital pe- 
riod (P). The power at the harmonic frequencies, 2/P, 3/P, ..., X/P, decreases as e"^"^, 
where e is the orbital eccentricity. The eccentricity for an ellipse is constrained to (0, 1) 
and most known exoplanets have eccentricities smaller than 0.15. Thus, applying the 
Bayesian surrogate model to Doppler data sets is expected to result in a posterior dis- 
tribution for the most significant frequency (/i) near one over the orbital period of the 
planet which dominates the Doppler signature. The second most prominent frequency 
(/2) could correspond to a harmonic of the first planet or to the orbital period of a sec- 
ond planet. Some planetary sy stems contain two planets with orbital pe r fods that differ 



by a f actor of nearly 2 (e.g., iLaughlin et al.l l2005t IWright et al.M2011l : iLissauer et al. 



2011bl ). For certain planet masses and eccentricities, the Doppler signa ture of two such 
planets can be mimicked by one planet with a more eccent ric orbit ( Giuppone et al.l 
2009t lAnglada-Escude et al.ll2010t iMoorhead and Fordll2010l) . If there is actually only 
one planet, then one would expect the next most prominent frequency to correspond 
to twice the fundamental frequency (/2 — 2 x /i) to within measurement precision. 
On the other hand, for systems that actually contain two planets, the sidereal orbital 
periods often deviate from exact resonance ([Lissauer et al.ll2011br ). One might hope 
that in cases where the ratio of orbital periods differed from two, that the surrogate 
model could recognize this difference {S = /2 — 2 x /i), so one could infer that the 
obser vations were due to two planets, rather than one planet on a more highly eccen tric 
orbit (|Giuppone et al.l 120091: lAnglada-Escude et al.|[2010l iMoorhead and Fordll2010f) . 



To explore this possibility, we applied the Bayesian surrogate model to several sets 
of Doppler observations of exoplanet host stars. We focused on exoplanets with a large 
velocity amplitudes and believed to have a significant eccentricity, as those systems 
provide the best prospects for measuring the harmonic frequencies precisely. In partic- 
ular, we choose systems with Ke^ > 3m/s, where K is the velocity amplitude and e is 
the orbital eccentricity. In all cases, the surrogate model efficiently found the funda- 
mental frequency corresponding to the orbital period. As expected, /i is very tightly 
constrained and the marginalized posterior for additional frequencies are significantly 
broader. In some cases, we found that the marginalized posterior for /2 did not corre- 
spond to 2 X /i. 

In order to determine how often /2 deviated from 2 x /i by chance, we performed a 
similar analysis on several simulated data sets. Each simulation was modeled on simu- 
lated velocities that were calculated according to the best-fit orbital period, amplitude, 
eccentricity, arrangement of pericenter and orbital phase. We added Gaussian measure- 
ment noise with a scale set by the claimed measurement uncertainty. The results for one 
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case (HD 162020) are shown in Figures [3] & |31 If we use the same observation times and 
uncertainties as the actual observations, then the marginal posterior distributions for 
P2 = I//2 and Pi/2 ~ 2//i do not overlap. If we generate a similar data set, but with 
random observation times, then the marginal posterior distributions for P2 = I//2 and 
Pi/2 = 2//1 do overlap. We conclude that one must be very cautious of aliasing due 
to unevenly spaced observations when interpreting the marginal posterior distributions 
for /,;'s for realistic data sets. 

Our results suggest that the problems of aliasing could be reduced by obtaining 
regularly (or randomly) spaced observations. Unfortunately, this is impractical for ob- 
servations made from the surface of the Earth. First, observations can not be made 
when the Sun is above the horizon (or even less than ^ 12° below the horizon) due to 
scattering of sunlight by Earth's atmosphere. As the Earth revolves around the Sun, the 
time of day at which a given star can be observed from a given site changes. For most 
stars, there are multiple months each year when high-precision Doppler observations 
are not possible, since the star appears too close to the sun (after projecting onto the 
sky). Thus, for most stars, Doppler observations are prone to aliasing at frequencies as- 
sociated with the solar day and the solar year. (In principle, observations of stars near 
the North or South pole from an observatory in the Arctic or Antarctic could avoid 
aliasing near the day. However, there are no observatories with high-precision Doppler 
capability near either pole due to logistical issues.) Second, while planet host stars 
are faint compared to the daytime sky brightness, they are much brighter than distant 
galaxies. Therefore, time allocation committees assign the vast majority of observing 
time near new Moon to extragalactic astronomers. Exoplanet searches are typically are 
assigned observing time near full Moon, introducing aliasing at frequencies associated 
with the lunar month (~ 29.5 days). In practice, the best way to avoid this is to 
dedicate an observatory to Doppler observations (e.g., the HARPS instrument at the 
European Southern Observatory's 3.6m telescope in La Silla, Chile). Of course, this 
requires considerable resources and is not an option for the world's largest telescopes. 
Third, astronomers often attempt to optimize the efficiency of their observations on a 
given night by observing each star when it is near the greatest (angular) altitude in the 
sky that night, as this minimizes the amount of absorption of starlight by the Earth's 
atmosphere. This strategy introduces yet another frequency associated with the sidereal 
day (i.e., the time for a star to return to the nearly same point on the sky from a given 
location, roughly 23 hours, 56 minutes and 4 seconds). While these may seem like picky 
details, each of these timescales can be found in public data and in some cases con - 
tribute to qualitative ambiguities in the orbital solutions ( Dawson and Fabryckyll2010l ). 



While attention to scheduling can help improve efficiency of planet searches, ultimately 
weather (i.e., cloudy skies) will lead to data gaps and prevent optimal experimental 
design for any Earth-based observatory. Space-based observatories are orders of mag- 
nitude more expensive to construct and operate. Therefore, we must develop tools to 
analyze realistic data sets. Studies such as this will help us to use those tools responsibly 
and reduce the risk of making erroneous claims. 
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3.2 Transit Timing Variations 

The rapidly increasing number of known exoplanets that transit their host star has 
led several observers to measure transit times in hopes of detecting deviations from a 
linear ephemeris due to perturbations by another (p otentially non-transiting) planet 
(e.g.. iMiller-Ricci et alJuOOSt iMacieiewski et al-lbOlOf) . In most previous studies, the 
number of precisely measured transit times has been insufficient to fit a physical model. 
The prospects for the transit timing variation method are poised to improve dramat- 
ically thanks to NASA's Kepler mission, which is observing over 100,000 stars nearly 
continuously for 3.5 years (|Borucki et al.luOllt ISteffen et al.ll2010l ). Indeed, Kepler re- 
cently reported the first strong evidence for transiting timing variations in Kepler-9 and 
Kepler- 11 systems ( Holman et aLlbOlOt iLissauer et al.ll2011a ). Interestingly, in both of 
these cases, transit timing variations were used to confirm planet candidates that had 
already been identified via transit and to constrain their masses and orbits. Kepler 
has also identifie d dozens of planet candidates with putative transit timing variations 
(JFord et al.ll201lf ). However, further observations are needed in order to infer the mass 
and orbital properties of the perturbing bodies. 

In principle, one cou ld model the full light curve (i.e., observed brightness versus 
time: ICarter et al.ll2011l ). However, this would vastly increase the computational time 
required and most of the information is contained in the transit time. (Using data from 
NASA's Kepler mission, there are typically ^ 10"^ — 10^ brightness measurements for 
each transit time.) The transit time is the most precisely measured parameter for each 
transit, and the transit times are sensitive to whether the transiting planet is slightly 
ahead or behind "schedule" due to gravitational perturbations from other planets. The 
next best-measured parameter is the trans it duration which dep end on the orbit of 
the transiting planet and the stellar radius ([Moorhead et al.ll201l[ ). The depth of each 
transit is primarily determined by the relative sizes of the planet and host star and is 
not affected by gravitational perturbations of other planets. The detailed shape of each 
transit also depends on detailed stellar properties ("limb darkening parameters"). For a 
system of non-coplanar planets, transit duration variations may be detectable. We have 
focused our analysis in this paper on coplanar systems viewed edge-on, so as to reduce 
the dimensionality of the parameter space to be explored. While more parameters are 
required to describe non-coplanar systems, in some cases it may be possible to derive 
additional constraint s on the orbits based on transit duration variations, or lack thereof 
( Holman et al.l 120101 ). Given the computational cost of modeling the full light curve, 
we recommend future research to develop tools to analyze a series of transit times and 
transit durations. 

To explore the potential for transit timing variations to enable the detection of a 
non-transiting planet, we have generated ~ 10^ simulated dat a sets with a wide variety 
of orbital periods, eccentricities and angles (jVeras et al.ll201ll ). We apply the surrogate 
model to simulated data sets to identify the dominant frequency and its amplitude. 
We find that the surrogate model can provide an accurate model for some data sets, 
particularly those very near a mean motion resonance, a regime whi c h is particular 1\ 
difficult to 



those very near a mean r notion resonance, a regirne wni c h is particularly 
approximate analytically ( Nesvornv and Morbidellil l2008t iNesvornvl 120091 : 
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Nesvornv and Beaugg 120101 ). In other cases, the transit timing variation signature is 



more complex and would require several frequencies to model adequately. For extended 
data sets, this can be challenging, both due to computation time and available memory. 

While the surrogate model can provide a reasonable approximation to many tran- 
sit timing signatures, the inferred model parameters depend sensitively on the orbital 
phases, as well as more basic physical parameters such as the planet mass and orbital 
period. Further, as one increases the number of observations, the inferred parameters 
often change significantly. This significantly complicates the interpretation of the sur- 
rogate model outputs. While the inferred model parameters will eventually stabilize 
with a sufficient number and time span of observations, we find that inferred param- 
eters can continue to change even after several years of observations. We conclude 
that it may not always be practical to invert transit t iming variations and infer th e 
mass and orbital properties of a non-transiting planet ( Ragozzine and HolmanI 120101 ). 



A more extended discussion o f implications for transit timing planet searches is pre- 



sented separately (jPavne et al. ; 2010; Veras ct al. 2011). Of course, our results do not 



prove that other analysis techniques can not invert transit timing variations. However, 
our surrogate model was designed to capture the most important aspects of the prob- 
lem. Thus, our results are suggestive that this problem may be more general. Since the 
original submission of this paper, the first confirmations of ex o planets via the transi t 
timing variation methods were published ( Holman et al.lbOlOt iLissauer et al.ll2011al ). 



In these cases, each of the detected planets transit the star, so transit timing varia- 
tions were used to confirm planet cand i dates that had already been identified by the 
standard transit technique. iFord et al.l (|2011[ ) showed that Kepler can be expected to 



measure transit timing variations for at least 12 systems with multiple transiting planet 
candidat es. Based on analysis of the frequency of multiple transiting planet candidate 
systems ( Lissauer et al.ll2011bl ). we expect that even more planets with transit timing 



variations will be significantly perturbed by a non-transiting planet. Indeed, based on 
the first four months of observations, Kepler has identified dozens of planet candidates 
with prospective transit timing variatio ns, most in system s with only a single transiting 



planet candidate. Both our results and lFord et al.l (J201ll) suggest that further observa- 



tions will be necessary before the masses and orbits of putative additional planets can 
be determined. 



3.3 Conclusions 

We developed a Bayesian surrogate model for analysis of time series data in general and 
applied this model to two types of exoplanet search data, Doppler and transit timing 
variations. The surrogate model can be evaluated very rapidly and we describe a method 
for efficiently integrating over most model parameters. This allows for calculating the 
(properly normalized) marginalized posterior probability and assessing the posterior 
probability for a given periodicity. 

One strength of the surrogate model is for exploratory data analysis. For exam- 
ple, astronomers routinely use the Lomb-Scargle periodogram to search Doppler data 
for a periodic signature of a planet and to identify the range of periods that should 
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be explored with a more detailed model. One limiting case of the surrogate model 
{Nf^rnax = 1 and Nd^max = 0) djrectly corres ponds to the Bayesian generalization of the 
Lomb-Scargle periodogram ( Cumming||2004 ) . When a planet has a large eccentricity or 



one star hosts multiple planets, the Lomb-Scargle periodogram typically reveals multi- 
ple significant periodicities. Previously, astronomers have dealt with this by applying 
the Lomb-Scargle periodogram to the residuals after subtracting the best-fit sinusoidal 
or Keplerian model. This approach can bias subsequent results, since the subtracted 
model is not exact. Further, assessing the significance of peaks in the periodogram of 
residuals is nontrivial. Most authors use a blind approach when searching for additional 
periodici ties, but others favor u sing informat ion about the frequencies previously identi- 
fied (e.g.. lKonacki and Macieie wski 1999; D awson and Fabrvckvll2010r ). In practice, this 



can lead to a combersome decission tree in a frequentist context. Our surrogate model 
(with Nf^rnax > 1) represents a Bayesian generalization of iterative frequentist methods 
for analyzing periodogram of residuals. The Bayesian surrogate model provides a rigor- 
ous basis for calculating Bayes factor of the marginalized posterior probability for the 
number of significant frequencies i3„+i^„ = p{Nf = n+l\x,y, a)/p{Nf = n\x, y, a). An- 
other advantage of the surrogate model is that by marginalizing over the other model 
parameters (e.g., frequencies, amplitudes, jitter, polynomial terms), a spurious false 
positive should be less likely than analyzing residuals to only the best-fit model. 

Finally, for many systems the surrogate model can provide a lower-dimensional model 
that still captures the important (i.e., observable) physical effects. For example, in a 
system of multiple low-mass planets, a full physical model has a dimension of ~ 7iVp, 
where Np is the number of planets. If the system has planetary orbits with small 
eccentricities and/or inclinations, then several of the model parameters may have no 
observable effect. For such systems, the surrogate model would be able to describe 
the system accurately using a lower-dimensional parameter space (~ SA'p), greatly 
increasing computational efficiency and perhaps increasing the sensitivity for detecting 
additional planets (due to the less extreme Occam's factor). 

The surrogate model is not meant to replace other tools for Bayesian parameter 
estimation and model selection. It is still beneficial to a pply MCMC (and variants) for 



estimation ana moaei selection, it is still oenenciai to a pply mumu ( and variants ) tor 
pararneter estimation usin g a more physical model (e.g.. lFordll2005tlGregorv 2005; FordI 



2006t I Johnson et al.ll201l[ ). Similarly, tools such as restricted Monte Carlo, importance 



sampling and paralle l tempering can be helpf ul for calculatin g Bayes factors using a more 
physical model (e.g.. lFord and Greg ory 20071: lGregorvll201 11) . Since these tools are much 



more computationally expensive than the surrogate model, they are most appropriate 
once a putative set of orbital periods has been identified (e.g., by periodogram analysis, 
surrogate model, or human inspection for sufficiently large signals). 

Thanks to the computational efficiency of the surrogate model, we were able to ana- 
lyze numerous simulated datasets including multiple planet systems, something that 
would not have been feasible using previously available Bayesian methods such as 
MCMC. For Doppler planet searches we find that realistic observing cadences can lead to 
significant aliasing that prevents precisely testing whether there is a harmonic relation- 
ship between measured frequencies. Given the close relationship of our surrogate model 
to the widely used Lomb-Scargle periodogram, our results also serve as caution regarding 
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the integration of results based on periodograni analyses (e.g.j Konacki and Macieiewski 
|l999; A nglada-Escude et al. 2010: Dawso n and Fabrycky 201^1 

For analyzing transit timing variations, we find the posterior distribution for the 
surrogate model parameters are sensitive to the exact orbital configuration. While the 
sensitivity to important physical parameters is advantageous, sensitivity to parameters 
that do not have dynamical significance makes interpretation of the posterior distri- 
bution for surrogate model parameters more challenging. Unfortunately, the transit 
timing signature often evolves on a timescale comparable to or longer than a realistic 
time spans for observations (e.g., 3.5-10 years for Kepler). This makes it impractical 
to build a library of possible transit timing signatures and the corresponding surrogate 
model outputs. The surrogate model may still be useful for establishing the significance 
of putative periodicities and/or long-term trends in transit timing data. In addition 
to the advantages of a Bayesian approach, the computational efficiency of the surro- 
gate model could be useful for analyzing large sets of simulated data sets to aid in 
interpretation of a transit timing variation planet search. 

In both cases, we find that qualitative results (e.g., whether the Bayes factor, Bn.n+i, 
for the significance of additional frequency is greater or less than unity) can depend on 
the choice of prior for the jitter parameter (cj). In this paper we used a mathematically 
motivated prior for aj . Our result suggests that practical application of the surrogate 
model would significantly benefit from further astronomical observations and statisti- 
cal aiialyses to assess the empirical distribution of the st ellar jitter for both Doppler 
(e.g.. I Wri ght 2005) and transit timing observations fe.g.. iHolman et al.l 120101) . Given 



the relationship of our surrogate model to non-Bayesian methods being employed by 
astronomers, our results also suggest caution in the interpretation of other results based 
on frequentist analyses (which typically assume a single fixed value of the jitter). 



4 Supplementary Material 

4.1 Practical Model Evaluation 

Integration over Linear Parameters 

The surrogate model is linear in the parameters: Si's, Ci's, and A's. For a given choice 
of Nf, Nd, /i's and aj, one can calculate the "best-fit" values of Sj's, Ct's, and Di's 
via simple linear algebra. The same linear algebraic operations allow the integrals over 
these linear parameters to be evaluated efficiently via the Laplace approximation (i.e., 
we expand the exponent about the best-fit parameters, ke ep the constant a nd second 
order terms, and extend the limits of integration to infinity: ICumminj ( 20041 )). For our 



applications, the posterior probability is typically a smooth function of CTj's, but can 
vary extremely rapidly with /^'s. Thus, we recommend fixing the choice of Nf, N^, and 
/i's, as that the integral over ctj can be evaluated efficiently with standard numerical 
integration techniques. (In the special case of equal measurement uncertainties, the 
integrals can be calculated analytically.) 
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Brute Force Integration over Frequencies 

Unfortunately, the integrals over the /i's must be evaluated numerically. We recom- 
mend discretizing these integrals and evaluating them via brute force . The number 
of frequencies to be evaluated (M) can be quite large (|Cummingj I20Q4I ) . Fortunately, 



there are several computational tricks that can speed up the calculation. In particular, 
most of the trigonometric functions can be computed using trigonometric identities to 
improve performance. We find that brute force integration over one or even two /^'s is 
practical for realistic data sets. (For large data sets, a large amount of RAM may be 
required for efficient evaluation.) 

Approximations for i\/lodels with IVIany Frequencies 

Unfortunately, brute force integration over more than two /^'s rapidly become pro- 
hibitive. Thus, we introduce the following approximation. We start by performing a 
brute force evaluation of the model conditioned on there being one frequency and then 
successively approximate the posterior conditioned on two frequencies, three frequen- 
cies, etc.. When approximating the model conditioned on there being Nf frequencies 
(and Nf > 2), we limit the set of /^'s with i < Nf that are considered to those which 
contributed significantly to the posterior probability for the model conditioned on there 
being Nf — 1 frequencies. That is we set a threshold (e.g., e — 10^**) and after evaluating 
the model conditioned on Nf = 1 frequencies, we store the mi frequencies which have 
the largest posterior probabilities and collectively sum to at least 1 — e of the posterior 
probability conditioned on there being 1 frequency. At this point, we have searched 
M X (1-1- mi) frequencies (rather than M^). Next, we estimate the marginal posterior 
probability for Nf = 3 by considering only those combinations of /i and /2 that have 
the largest posterior probabilities and collectively sum to at least 1 — e of the poste- 
rior probability conditioned on there being 2 frequencies. Thus, when evaluating the 
model conditioned on Nf frequencies, there are no more than M YiiJi '^i frequencies 
to be explicitly evaluated, much less than M^f . In principle, this approximation can 
be relaxed by performing a full search over two frequencies. In this case evaluating the 
model conditioned on Nf frequencies, requires no more than M^ Y[i=i ""^i frequencies 
to be explicitly evaluated, which may or may not be practical for a given data set. 

Truncating the Number of Frequencies 

For data sets which are well described be only one or a few frequencies, models which 
include a large number of frequencies will have a very small posterior probability due to 
the Occam's factor associated with the higher-dimensional model. Thus, there is little 
point in evaluating models with large Nf. Thus, we recommend successively calculating 
the posterior probability conditioned on Nf {p{0\Nf, Xk,yk, crk)) and stopping at Nf^stop, 

such that p{Nf = Nf^stop\xk,yk,o-k) < Y^iJo''"''^ Pi^f = '^\xk,yk,crk) 
and approximate the remaining models as p{Nf > Nf^stop\xk, yk,<^k) ~ 0. 
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Computational Cost 

The surrogate raodel can be evaluated much more quickly than an n-body integration, 
explores a lower-dimensional parameter space, and takes advantage of the linearity of 
the model in most of the model parameters. Nevertheless, the strong sensitivity to 
frequency dictates that we must perform a fine sampling in frequency. For example, 
consider a case of a ~ lOM^^rth-niass transiting planet with an orbital period near 
4 days and a small planet which is not observed to transit with a period near 8 days. 
With a modest eccentricity (0.1), the transit timing variations of the inner planet could 
be ~ 10 minutes, co mparable to the t iming precision for each transit for a typical 



Kepler planet host star (JFord et al.ll201ll) . Over 7.5 years of observations (possible with 
an extended Kepler mission), one would observe roughly 680 transits, allowing for an 
easy detection of such a single. If we assume a timing precision of 10 minutes and set 
/max to 1 days, then each integral over frequency requires considering M ~ 180, 000 
frequencies. With a single core of a AMD Opteron 275 processor (2.2 GHz), this takes 
~ 10, 16 or 37 seconds for models with Nf = 1, 2 or 3. Setting e = 10"'^ (see 
supplementary materials), we needed to compute 1 {Nf — 1), 11 {Nf — 2) and 16 
{Nf = 3) scans over frequency for both N^^ = and 1, requiring a total of 12 CPU- 
minutes for one system. Thus, the brute force exploration is relatively fast for a single 
model. For systems with no detectable signal, the time required decreases significantly, 
as Nf = 3 (or even 2) need not be explored. On the other hand, the computation 
time per system required grows significantly as the signal-to-noise increases, since the 
number of frequencies sampled (Af ) must be increased to avoid missing a narrow peak 
in the posterior density. Of course, in these cases, the signal is sufficiently large that 
fancy statistical methods are not necessary to detect the dominant signal. The speed of 
the surrogate model allowed us to analyze millions of simulated data sets and to explore 
the complex parameter space, using a cluster with hundreds of AMD Opteron servers 
at the University of Florida High-Performance Computing Center. 
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Figure 1: In tliis figure, we show the transit timing signature for a series of two planet 
systems, each extending for 3.5 years, the nominal mission lifetime for NASA's Kepler 
space observatory. In each case, transit times are for a 20 Earth-mass planet following 
an initially circular orbit with a star-planet separation of approximately 0.05AU. (One 
AU is the average distance between the Earth and the Sun). In each case, there is an 
additional 2 Earth-mass planet (which we assume is not observed to transit) following a 
slightly eccentric orbit (e2 — 0.1) and an initial mean star-planet separation of: 0.078AU 
(upper left), 0.080AU (upper right), 0.080AU (middle left), 0.082AU (middle right), 
0.084AU (lower left) or 0.086AU (lower right). These separations are near the location 
of the 1:2 mean motion resonance (~ 0.0794AU). Note that the vertical axis scale 
changes from row to row. Even these small changes in the orbital separation result in 
qualitative changes in short and long-term structure of the transit timing variations. 
Here we show simulated data from full n-body integrations with no data gaps and 
no measurement uncertainties. In practice, even Kepler misses some transits (e.g., 
due to data downlink with Earth, spacecraft abnormalities) and the transit timing 
measurements have noise of^ 10^"^ — 10~^ days, depending primarily on the brightness 
of the target star and the s ize of the planet. For son ie planetary systems (e.g., Kepler- 



01 tne target star ana tne s ize or tne planet, for sor ae planetary systern 
9 bfc c Holm an et a,l. 20101) or triple star systems ( Carter et al.ll2Qlll : 



^___ ,__ ,. ,_ --r-- -., ,, ,_ ISlawson et aL. 

2011 : Stcffe n et al.ll201l[) . the amplitude of the transit (eclipse) timing variations is 
much larger than Kepler's measurement uncertainties. For other planetary systems 
(e.g., Kepler-11 b-f iLissauer et al.ll2011bl ). the magnitude of transit timing variations 
are comparable to the measurement uncertainties. We expect the Bayesian approach 
and our surrogate model, to be most useful for such systems, once a sufficient number 
and timespan of observations have been collected. For many systems with no detectable 
transit timing variations, statistical methods such as those described here will play an 
important role in establishing the significance of n on-detection and the im plied upper 



limits for the mass of any perturbing planets (e.g.. lSteffen and Agolll2005r i. 
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Figure 2: This figure is similar to Figure [U except the the outer planet has been moved 
closer to the transiting planet. In each case, there is a 2 Earth-mass planet (which 
we assume is not observed to transit) following a slightly eccentric orbit (e2 = 0.1) 
and an initial mean star-planet separation of: 0.062AU (upper left), 0.063AU (upper 
right), 0.064AU (middle left), 0.065AU (middle right), 0.066AU (lower left) or 0.067AU 
(lower right). These separations are near the location of the 2:3 mean motion resonance 
(~ 0.0655AU). Note that the vertical axis scale changes from row to row. Again, even 
these small changes in the orbital separation result in qualitative changes in short and 
long-term structure of the transit timing variations. 
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Figure 3: In this figure, we consider a simulated data set based on the best-fit orbital 
parameters for the exoplanet HD 162020. We show the marginalized posterior prob- 
ability distributions for Pi/2 = 2//i (red; very narrow distribution) and P2 = I//2 
(blue; broad distribution) from a surrogate model with Nf = 2. Based on the Fourier 
expansion of the Doppler signature of a planet on a Keplerian orbit, one would expect 
that the surrogate model would yield marginalized posterior probability distributions 
for /2 which overlaps marginalized posterior probability for /i/2. We find that this 
is not necessarily the case, even for this simulated data set with high signal-to-noise, 
uncorrelated Gaussian measurement errors and accurate estimates of the measurement 
uncertainties. For this calculation, we have used the actual times of observations. 
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Figure 4: This figure is analogous to Figure |31 except that the observation times are 
chosen randomly. In this case, the posterior for P2 — I//2 (blue; broad distribution) 
overlaps with the posterior for Pi/2 = 2/i (red; narrow distribution), as expected for 
a Keplerian orbit. Contrast this with Figure |3] which uses actual observation times. 
We conclude that for realistic Doppler data sets, aliasing due to limited number of the 
unevenly spaced observations can result in the marginal posterior distributions for /i/2 
and /2 not overlapping. This poses a considerable challenge to testing the one-planet 
model based on the dominant frequencies. 



